2. Data pre-processing
• Data pre-processing is one of the most important steps in machine learning.
• It is the most important step that helps in building machine learning models more accurately. In machine learning,
there is an 80/20 rule.
• Every data scientist should spend 80% time for data pre-processing and 20% time to actually perform the analysis.
• Data pre-processing is a process of cleaning the raw data i.e. the data is collected in the real world and is converted
to a clean data set.
Types of Data
1. Numeric e.g. income, age
2. Categorical :- sometimes called qualitative data, are data whose values describe some characteristic or category(
e.g. gender, nationality)
3. Ordinal:- type of data that follows a natural order. The main features of ordinal data are that the difference between
data values cannot be determined ( e.g. low/medium/high , socio economic status (“low income”,”middle
income”,”high income”), education level (“high school”,”BS”,”MS”,”PhD”), income level (“less than 50K”, “50K-100K”,
“over 100K”), satisfaction rating (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).